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1 .  INTRODUCTION 


1.1  Belief  Revision 

The  most  frequently  discussed  method  of  revising  a  subjective  probabi- 

•k 

lity  distribution  P  to  obtain  a  new  distribution  P  t  based  on  the  occur- 

* 

rence  of  an  event  E,  is  Bayes  rule:  P  (A)  *  P(AE)/P(E).  Richard  Jeffrey 
(1965,  1968)  has  argued  persuasively  that  Bayes'  rule  is  not  the  only  reasonable 
way  to  update:  use  of  Bayes'  rule  presupposes  that  both  P(E)  and  P(AE) 
have  been  previously  quantified.  In  many  instances  this  will  clearly  not 
be  the  case.  Consider  the  following  example: 

Coin  Tossing.  Suppose  we  are  thinking  about  three  tosses  of  a  coin.  Under 
the  usual  circumstances  a  probability  assignment  is  made  on  the  eight  possi¬ 
ble  outcomes  ft  =  (000,  001,  010,  Oil,  100,  101,  110,  111).  Suppose  an  in¬ 
formant,  believed  trustworthy,  announces:  "Oh,  I  see  you're  thinking  about 
that  coin.  I  just  spun  it  100  times  in  the  other  room  and  it  came  up  heads 
80  times".  This  is  clearly  relevant  information  and  we  will  obviously  want 
to  revise  our  opinion.  The  information  cannot  be  put  in  terms  of  the  occur¬ 
rence  of  an  event  in  the  eight  point  space  ft  and  the  Bayes  rule  is  not 
directly  available.  Among  many  possible  approaches,  four  methods  of  in¬ 
corporating  the  information  will  be  discussed. 

a)  Complete  Reassessment. 

b)  Jeffrey's  Rule. 

c)  Retrospective  Conditioning. 

d)  Exchangeability. 

a)  Complete  Reassessment.  In  the  absence  of  further  structure  it  is 
always  possible  to  react  to  the  new  information  by  completely  reassessing 
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P  ,  presumably  using  the  same  techniques  used  to  quantify  the  original 
distribution  P. 

b)  Jeffrey's  Rule.  Suppose  that  the  original  probability  assignment 
£  was  exchangeable.  That  is,  P(001)  «  P(010)  *  P(100)  and  P(110)  *= 
P(101)  =  P(011).  In  the  situation  described,  the  information  provided  con¬ 
tains  no  information  about  the  order  of  the  next  three  tosses  and  thus  we 
may  well  require  the  new  probability  distribution  remain  exchangeable.  This 
is  equivalent  to  considering  a  partition  {E^^q  ^  »  w^ere  Eq“{000}, 

El  =  {001,  010,  100},  E2  =  {110,  101,  011},  E3  -  {111}.  Here  Ei  is  the 
set  of  outcomes  with  i  ones  and  exchangeability  implies  that  for  any 

'ft 

event  A  ,  and  any  i  ,  P(AjEi)  =  P  (A| E^> .  To  complete  the  probability 

*  *  * 
assignment  P  ,  a  subjective  assessment  of  P  (E^)  is  needed.  Then  P 

is  determined  by 

P*(A)  *  EP^AjE^  P*(Ei)  =*  EPUjEj)  P*(E±)  . 


The  rule 


(1.1)  P*(A)  -  EPCAlE^  P*(Ei) 

is  known  in  the  philosophical  literature  as  Jeffrey's  rule  of  conditioning. 
It  is  valid  whenever  there  is  a  partition  {E^}  of  the  sample  space  such 
that 

(J)  P*(A|Ei)  -  PCaJEj)  for  all  A  and  i  . 
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c)  Retrospective  Conditioning.  Some  subjectivists  have  suggested  try¬ 
ing  to  analyze  this  kind  of  problem  by  momentarily  disregarding  the  new  inf or- 

* 

mation,  quantifying  a  distribution  on  a  space  £2  rich  enough  to  allow  or¬ 
dinary  conditioning  to  be  used,  and  then  using  Bayes'  rule.  For  some  dis¬ 
cussion  of  this,  see  de  Finetti  (1972,  Chap.  8)  and  Section  2.1  below.  It 
is  worth  emphasizing  that  this  type  of  retrospective  conditioning  is  an  ex¬ 
tremely  difficult  psychological  task;  Fischoff  (1975),  and  Fischoff  and 
Beyth  (1975)  have  demonstrated  that  "reporting  the  outcome  of  a  historical 
event  increases  the  perceived  likelihood  of  that  outcome",  and  Slovic  and 
Fischoff  (1977)  have  shown  that  "similar  hindsight  effects  occur  when  peo¬ 
ple  evaluate  the  predictability  of  scientific  results  —  they  tend  to  be¬ 
lieve  they  'knew  all  along'  what  the  experiments  would  find".  Nor,  in 
principle,  is  retrospective  conditioning  simpler  than  complete  reassessment: 

since  P  (A)  =  P(AE)/P(E)  in  this  case,  assessment  of  P(AE)  for  each  A 

* 

is  equivalent  to  reassessment  of  P  (A) . 

d)  Exchangeability.  The  three  future  tosses  of  the  coin  may  be  regarded 
as  exchangeable  with  the  100  tosses  reported  by  our  informant.  Standard 
Bayesian  computations  can  then  be  used. 

Approaches  b,  c,  and  d  are  all  special  routes  to  the  requantification 
of  approach  a;  each  is  valid  or  useful  under  different  assumptions.  For 
example,  Jeffrey's  rule  assumes  the  availability  of  a  partition  and  the  val¬ 
idity  of  assumption  J.  Retrospective  conditioning  assumes  that  one  can  do 
a  reasonable  job  of  assessing  probabilities  as  if  the  data  had  not  been  ob¬ 
served.  Exchangeability  assumes  that  future  trials  are  based  on  the  same 
mechanism  as  past  ones;  in  the  example  this  might  not  be  reasonable,  per¬ 
haps  the  past  trials  were  spins  on  a  table,  the  future  trials  are  tosses 
onto  the  floor. 
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In  this  paper  we  study  the  assumptions  and  conclusions  that  attend 
Jeffrey's  rule.  Our  main  contributions  are  technical:  In  Section  2  we 
connect  Jeffrey's  rule  with  sufficiency;  Sections  3,  4,  and  5  analyze 
what  happens  when  two  or  more  partitions  are  considered.  In  Section  3  we 
discuss  commutativity  of  successive  updating.  In  Section  4  we  discuss 
methods  for  dealing  with  two  partitions  simultaneously,  giving  a  necessary 
and  sufficient  condition  for  two  probability  measures  on  two  algebras  to 
have  a  common  extension.  In  Section  5  we  discuss  some  other  motivations 
for  Jeffrey's  rule  when  condition  (J)  has  not  been  subjectively  checked. 
Jeffrey's  rule  gives  the  "closest"  measure  to  P  which  fixes  P*(Ei)  , 
and  is  related  to  the  iterated  proportional  fitting  procedure  used  in  the 
statistical  analysis  of  contingency  tables.  For  ease  of  exposition,  most 
of  this  paper  assumes  a  countable  state  space  or  a  countable  partition 
In  Section  6  we  describe  the  mathematical  machinery  needed  to 
extend  the  previous  results  to  abstract  probability  spaces. 

1.2  Historical  and  Bibliographical  Note 

We  do  not  propose  to  survey  here  the  growing  philosophical  literature 
on  probability  revision  and  Jeffrey's  rule.  The  following  quotations  and 
references,  however,  should  make  clear  that  the  problem  was  early  recognized 
by  the  founders  of  modern  subjective  probability,  and  may  be  helpful  as  a 
guide  to  the  recent  literature. 

From  the  subjectivistic  perspective,  the  conditional  probability  P(a|e) 
is  the  probability  we  currently  would  attribute  to  an  event  A  if  in  addi¬ 
tion  to  our  present  information  we  were  also  to  learn  E.  In  the  language 
of  betting,  it  is  "the  probability  that  we  would  regard  as  fair  for  a  bet 
on  A  to  be  made  immediately,  but  to  become  operative  only  if  E  occurs" 

(de  Finetti,  1972,  p.  193;  cf.  Ramsey  1931,  p.  180).  In  this  formulation, 
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the  equality  P(a|e)  «*  P(AE)/P(E)  is  not  a  definition,  but  follows  as  a 
theorem  derived  from  the  assumption  of  coherence  (de  Finetti,  1975,  Chapter  4). 


If  we  actually  learn  E  to  be  true,  it  is  conventional  to  adopt  as 
one's  new  probability 

(1.2)  P*(A)  -  P(A|E)  . 

Assumption  (1.2)  seems  entirely  plausible  -  what  else  should  our  pro¬ 
bability  of  A  be,  given  that  we  have  learned  E  ,  and  nothing  else,  other 
than  the  probability  which  we  were  willing  to  attribute  to  A  if  we  were 
subsequently  to  learn  E?  Several  authors  have  pointed  out  that  (1.2)  is 
an  assumption.  Hacking  (1967,  p.  314)  refers  to  (1.2)  as  the  dynamic  assump¬ 
tion  of  personalism,  to  contrast  it  with  the  static  nature  of  the  assumption 
of  coherence.  Hacking  (1967,  pp.  315-316)  points  out  that  coherence  in  its 
usual  sense  does  not  entail  (1.2)  and  de  Finetti  concedes  as  much  when  he 
refers  to  an  unexplained  "criterion  of  temporal  coherency"  (de  Finetti,  1972, 
p.  150);  cf.  Ramsey  (1931,  p.  192),  who  similarly  asserts  that  "when  my 
degrees  of  belief  change  in  this  way  we  can  say  that  they  have  been  changed 
consistently  by  my  observation".  For  two  attempts  at  a  partial  justifica¬ 
tion,  however,  see  Freedman  and  Purves  (1969),  Teller  (1976). 

Ramsey  himself  perhaps  stated  the  difficulty  most  clearly: 

[The  degree  of  belief  in  £  given  £]  is  not  the  same  as  the 
degree  to  which  [a  subject]  would  believe  £,  if  he  believed 
q  for  certain;  for  knowledge  of  q  might  for  psychological  rea¬ 
sons  profoundly  alter  his  whole  system  of  beliefs  [Ramsey  1931, 
p.  180;  cf.  however,  p.  192]. 

Other  reservations  about  the  adequacy  of  conditionalizatlon  as  an 
exclusive  model  for  belief  revision  center  around  its  assumption  about 
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the  form  in  which  new  information  is  received.  Indeed,  Jeffrey's  original 
philosophical  motivation  for  introducing  probability  kinematics  was  his  be¬ 
lief  that  "It  is  rarely  or  never  that  there  is  a  proposition  for  which  the 
direct  effect  of  an  observation  is  to  change  the  observer's  degree  of  be¬ 
lief  in  that  proposition  to  1"  (Jeffrey  1968,  p.  171).  Similar  criticisms 
have  been  raised  by  Shafer  (1979,  1981),  whose  theory  of  belief  functions 
is  a  more  radical  attempt  to  deal  with  the  problem.  Both  hold  that  condi¬ 
tioning  on  an  event  requires  the  assignment  of  an  initial  probability  for 
that  event,  prior  (in  principle  at  least)  to  its  observation,  and  for  many 
classes  of  sensory  experiences  this  seems  forced,  unrealistic,  or  impossible. 

For  example,  suppose  we  are  about  to  hear  one  of  two  recordings  of 
Shakespeare  on  the  radio,  to  be  read  by  either  Olivier  or  Gielgud,  but  are 
unsure  as  to  which, and  have  a  prior  with  mass  H  on  Olivier,  h  on  Gielgud. 
After  hearing  the  recording,  one  might  judge  it  fairly  likely,  but  by  no 
means  certain,  to  be  by  Olivier.  The  change  in  belief  takes  place  by  di¬ 
rect  recognition  of  the  voice;  all  the  integration  of  sensory  stimuli  has 
already  taken  place  at  a  subconscious  level.  To  demand  a  list  of  objective 
vocal  features  which  we  condition  on  in  order  to  affect  the  change  would 
be  a  logician's  parody  of  a  complex  psychological  process. 

Another  issue  is  that  our  "[subjective]  probabilities  can  change  in 
the  light  of  calculations  or  of  pure  thought  without  any  change  in  the 
empirical  data..."  (Good  1977,  p.  140).  I.  J.  Good  terms  such  probabili¬ 
ties  "evolving"  or  "dynamic"  and  has  discussed  them  in  a  number  of  papers 
(cf.,  e.g.,  Good  1950,  p.  49;  1968;  1977).  There  are  serious  difficulties 
in  attempting  to  model  such  types  of  belief  revision,  particularly  if,  as 
noted  by  Savage  (1967,  p.  308)  and  others,  the  new  information  is  a  logical 
or  mathematical  consequence  of  the  old.  For  recent  progress  in  this  direc¬ 
tion,  see  Garber  (1982),  Jeffrey  (1982). 
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It  is  useful  in  considering  these  questions  to  distinguish  between  the 
actual,  practical  application  of  Bayes'  theorem  and  its  use  in  modelling 
successive  revision  in  belief  of  a  hypothetical  "rational  agent".  As  a 
practical  matter  our  new  beliefs  may  bear  little  relation  to  our  old  ones; 
modelling  process  of  change  so  general  seems  elusive.  Assuming  "temporal 
coherence"  results  in  a  plausible  description  of  belief  revision  with  in¬ 
teresting  mathematical  consequences  (convergence  of  limiting  frequencies, 
asymptotic  normality  of  posterior  distributions,  etc.).  Jeffrey's  rule 
places  fewer  restrictions  on  the  hypothesized  form  of  belief  revision, 
yet  retains  enough  structure  to  permit  interesting  conclusions  to  emerge 
(hence  the  name  "probability  kinematics"). 

Jeffrey's  rule  was  introduced  in  Jeffrey  (1957)  and  is  further  dis¬ 
cussed  in  Jeffrey  (1965,  Chapter  11)  and  Jeffrey  (1968).  Isaac  Levi  (1967; 
1970,  pp.  147-152)  is  a  vigorous  critic  of  Jeffrey's  version  of  probability 
kinematics,  but  has  been  thoroughly  rebutted  by  Jeffrey  (1970,  especially 
at  pp.  173-179).  Jeffrey's  idea  was  partially  anticipated  by  the  Oxford 
astronomer  Donkin  (1851,  p.  356);  cf.  Boole  (1854,  pp.  251-252),  Whitworth 
(1901,  pp.  162-169,  181-182),  Keynes  (1921,  pp.  176-177).  An  independent 
proposal  of  Jeffrey's  rule  appears  in  Griffeath  and  Snell  (1974).  The 
last  few  years  have  seen  a  sudden  upsurge  of  interest  in  Jeffrey  condi- 
tionalization;  papers  have  appeared  by  May  and  Harper  (1976),  Teller  (1976), 
Field  (1978),  Shafer  (1981),  Williams  (1980),  van  Fraassen  (1980),  Garber 
(1980),  Domotor  (1980),  and  Armendt  (1980). 
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2.  JEFFREY'S  RULE  OF  CONDITIONING 

In  this  section  we  develop  some  of  the  mathematics  connected  with 

Jeffrey's  rule  of  conditioning.  Formally:  ft  is  a  countable  set,  P  and 
£ 

P  are  probability  measures  on  the  subsets  of  ft  ,  and  {E^}  is  a  parti¬ 
tion  of  ft. 


2.1  Bayesian  Conditioning 

Jeffrey's  rule  of  conditioning  is  a  generalization  of  ordinary  condi¬ 
tioning:  given  the  partition  {E,E°}  ,  if  P  (E)  “  1  and  P  (A)  ■ 

IP(A|E.)  P  (Ej)  ,  then  P  (A)  -  P(A|E).  We  therefore  begin  by  investiga- 
ting  when  one  measure  P  can  arise  from  another  measure  P  by  condition¬ 
ing.  To  be  precise,  suppose  P  and  P*  are  measures  on  a  countable  space 
ft.  We  will  say  that  P*  can  be  obtained  from  P  by  conditioning  if  there 
exists  a  probability  space  (ft,  G,  Q)  ,  and  events  {E^}^^,  E^  e  G  (to  be 
thought  of  as  "E  *  w  occurred"),  such  that  Q(E  )  *  P(co)  ,  and  an  event 

(i>  w 

E  e  G  such  that  Q(E)  >  0  and  Q(E(jj ( E)  ■  P  (w) . 


Theorem  2.1:  P*  can  be  obtained  from  P  by  conditioning  if  and  only 
if 

(2.1)  P*(w)  £  B  P(w)  for  some  constant  B  1  and  all  to. 

Proof:  If  P*  can  be  obtained  from  P  by  conditioning,  let  ($,  G,  Q) , 
{E^}  ,  E  be  given.  Then  for  any  weft, 


v 

P  (w) 


Q<V 


*<EJE)  i  w 


M 

Q(E)  * 


This  gives  (2.1)  with  B  «  1/Q(E) . 
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Conversely,  suppose  (2.1)  is  satisfied.  Let  H  =  Q  x  {a,b}  = 

{(o>,a),  (oi,b)}  oiefi.  Let  G  be  the  set  of  all  subsets  of  H.  Let 
E  *  (to,  a)  U  (co,b)  ,  and  let  E  =  U  (to, a) .  Solving  the  problem  of  finding 
Q  formally  leads  to  introducing  a  parameter  t,  0  <  t  <  1  (t  will  turn 
out  to  be  Q(E)),  and  setting 

Q((oo,a) )  =  t  P*(tu) 

Q((oo,b))  =  P(to)  -  t  P*(co)  . 


Because  (2.1)  is  satisfied,  t  can  be  chosen  small  enough  so  that 

Q((to,b))  >.  0.  It  is  then  straightforward  to  check  that  Q  is  a  probability 

on  H  satisfying  Q(E  )  =  P (to)  and  Q(E^|e)  =  Q( (to, a)) /ZQ( (to, a))  =  P*(u>) 

as  required.  0 

* 

Condition  (2.1)  places  a  restriction  on  P,  P  when  both  have  countable 
support  (but  not  when  both  have  finite  support  and  supp(P  )  c  supp(P)).  For 
example,  no  geometric  distribution  can  be  obtained  from  a  Poisson  distribu¬ 
tion  by  conditioning,  but  any  Poisson  distribution  can  be  obtained  from  any 

geometric.  If  fi  is  uncountable,  (2.1)  can  be  replaced  by  the  conditions 
* 

*  ,  dP 

P  <<  P  and  — rr-  €  L  ;  see  Section  6. 
dP  00 


Jeffrey  Conditionalization  and  Sufficiency 


In  the  example  discussed  in  Section  1,  the  partition  {E^}  naturally 

it 

arose  in  the  course  of  constructing  P  from  P.  But  one  might  instead 

r  *1 

envisage  being  given  another  person  s  1P,P  }  and  then  trying  to  recon¬ 
struct  a  possible  partition  {E^}  from  which  the  pair  (P,P  }  could  have 
arisen  via  Jeffrey  conditionalization.  Unlike  Bayesian  conditionalization, 


this  turns  out  to  be  always  possible. 


'.*V 


that 


To  apply  Jeffrey's  rule,  it  is  required  to  find  a  partition  {E^}  such 

P(A| E±)  =  P*(A|E  )  for  all  A  and  i  . 

This  is  simply  the  problem  of  finding  a  sufficient  partition  for  the  two-element 
family  3  =  {P,P  }  ;  see  Blackwell  and  Girshick  (1954),  Chapter  8.  This 
simple  observation  makes  possible  the  translation  of  the  ideas  of  minimal 
sufficiency  and  likelihood  ratio  into  the  language  of  Jeffrey's  rule. 

A  partition  {E^}  is  said  to  be  coarser  than  a  second  (E^ }  if  every 
E^  is  a  union  of  sets  in  {E^}.  For  purposes  of  updating  probability,  a 

k 

coarser  partition  has  the  advantage  that  P  need  be  specified  on  fewer 
sets.  A  coarsest  sufficient  partition  is  said  to  be  minimal  sufficient. 

The  following  (well  known)  theorem  gives  an  alternate  version  of  Jeffrey's 
rule  and  states  that  there  is  always  a  coarsest  partition  for  which  Jeffrey's 
rule  is  valid.  Some  philosophical  implications  of  this  fact  are  discussed 
by  van  Fraassen  (1980) . 

: k 

Theorem  2.2:  Let  P,  P  be  probability  measures  with  common  support 
on  the  countable  set  If  {E^}  is  a  countable  partition  of  such 

that  P(A|Ei)  =  P  (a|e±)  for  all  subsets  A  and  elements  of  the  partition 
,  then  for  each  w  c  Q  , 

*  p*<Ei> 

(2.2)  P  (to)  =  p^E  y  P(<o),  W  e  Et  . 

If  R  =  {x  :  P  (to)  /P  (to)  =  x ,  to  c  12}  ,  and  E^  =  {to  :  P  (to)  /P  (to)  *  x ,  to  €  0} , 
then  {Ex  :  x e  R}  is  a  minimal  sufficient  partition  for  {P,  P  }. 

Proof :  The  first  statement  is  a  version  of  the  Fisher-Neyman  factori¬ 
zation  theorem;  for  the  second,  see  Blackwell  and  Girshick  (1957,  p.221).  □ 

The  following  example  illustrates  the  use  of  the  likelihood  ratio  form 
of  Jeffrey's  rule. 
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Example  2.1  (Whitworth  1901,  pp.  167-168): 


f/Mejtum  138.  A,  /l,  C  were  entered  fur  •  mec,  and  their  respective 

9  4  5 

chances  of  winning  woro  estimated  jj1  hut  circumstances 

come  to  our  knowlodgc  in  favour  of  A,  which  raise  his  chance  to  5 ; 
what  aro  now  the  chances  ill  favour  of  B  and  C  respectively  1 

Annotr.  A  could  lose  in  two  ways,  vis.  either  by  B  winning  or  by 

C  winning,  and  the  respective  chances  of  his  losing  in  these  ways  were 

4  5  9 

a  priori  j-|  and  jy ,  and  the  chanoe  of  his  losing  at  all  was  —  .  But 


after  our  iu-cc*xi»ii  of  knowledge  the  chance  of  his  losing  at  all  liecoincs 

that  is,  it  U-roiues  diminished  in  the  ratio  of  18  :  11.  Heme  the 

chance  of  either  way  in  which  he  might  lose  is  diminished  in  the  same 
ratio.  Therefore  the  chains:  of  ft  winning  is  now 


and  of  C  winning 


4  11 

ll  *  18’  °T 


4_ 
18  : 


5  11  5 

U  *  18*  °r  IS* 


These  are  therefore  the  required  chances. 


2.3  General izat ion 

There  Is  a  version  of  Jeffrey's  rule  which  takes  the  support  of  the 

* 

measures  P  and  P  into  account.  Call  a  point  w  e  ft  a  support  point 

of  P  if  P(to)  >  0.  Let  supp(P)  denote  the  set  of  support  points  of  P. 
In  general,  P  and  P  will  not  have  the  same  support  -  Indeed  with  stan¬ 
dard  conditioning  supp(P*)  is  strictly  smaller  than  supp(P) .  Clearly  P* 
will  simply  have  to  be  freshly  quantified  on  supp(P*)  -  supp(P) .  This 
leads  to  the  following  generalized  form  of  Jeffrey's  rule: 

(2.3)  Suppose  {Ej}  is  a  partition  of  S  -  supp(P)  n  supp(P*) 
such  that 

(J)  P(oi)e^)  -  P  (ulE^)  for  all  u)  e  S  and  all  i. 

Then  for  any  set  A  , 

P*(A)  -  ZP(A|E1)  P*(E1)  +  P*(A  n  (supp(P*)  -  supp(P) ) ) • 

In  what  follows,  we  will  assume  that  supp(P*)  -  supp(P).  Then 
Jeffrey’s  rule  simplifies  to  the  form  P*(A)  -  IP(AjE1)  P*(Ep  as  given  in 
(1.2).  All  the  results  we  prove  have  straightforward  modifications  to  the 
general  situation  (2.3)  by  restricting  attention  to  supp(P*)  0  supp(P). 
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3.  SUCCESSIVE  UPDATING 


In  the  usual  applications  of  subjective  probability,  information  builds 
up  by  successive  conditioning.  In  Bayesian  conditionalization  the  order  in 
which  new  information  is  incorporated  is  irrelevant;  in  Jeffrey  conditional¬ 
ization  the  situation  is  more  complex. 


3 . 1  The  Problem 


Consider  an  initial  probability  P 
probability  P  based  on  a  partition 

e 

=  p±,  i  =  1,  2,  ....  e  ;  clearly 


which  is  Jeffrey  updated  to  the  new 
{Vi-l  and  new  probabilities 
P*(A|E1)  -  P^(a| E±)  »  PUlEj)  holds 


for  our  new  opinion.  (P  denotes  our  new  opinion,  however  it  is  obtained: 

by  Bayes'  theorem,  Jeffrey's  rule,  complete  requantification  or  whatever. 

P  denotes  the  specific  updated  probability  measure  that  results  from  Jeffrey 
c 

conditionalization.)  We  then  decide  to  update  based  on  *  and  Indi¬ 

cate  this  order  of  updating  by  P  .  To  use  Jeffrey's  rule  at  the  second  stage 
we  must,  of  course,  accept  the  (J)  condition  so  P^(a|f^)  “  P^(a|Fj)  *  P^(A|Fj)  . 
Clearly  the  order  of  updating  matters,  since  the  current  opinion  dominates: 


Example  3.1.  £  =  3  ,  i.e.,  our  belief  on  the  partition  {E^}  changes 

first  to  p^  and  then  to  q^.  The  first  revision  and  second  revision  differ 

* 

and  we  currently  believe  P  (E^)  *  q^. 

Example  3.2.  Suppose  that  in  a  criminal  case  we  are  trying  to  decide 
which  of  four  defendants,  called  a,  b,  c,  d,  is  a  thief.  We  initially 
think  P(a)  *  P(b)  »  P(c)  =  P(d)  *  1/4.  Evidence  is  then  introduced  to  show 
that  the  thief  was  probably  left-handed.  The  evidence  does  not  demonstrate 
that  the  thief  was  definitely  left-handed  but  leads  us  to  conclude  that 
P(thief  left-handed)  x  .8.  If  a  and  b  are  the  defendants  who  are 
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*  V  <  jjty  •  x 


left-handed,  then  Ej^  -  {a,b},  E2  -  {c,d}  and  Pg(Ej)  -  .8,  P^CEj)  "  *2. 

If  the  only  effect  of  the  evidence  was  to  alter  the  probability  of  left- 
handedness  -  in  the  sense  that  P(a|e.)  ■  P  (A|E.)  -  then  P  is  obtained 
from  Jeffrey's  rule  as  Pfi(a)  *  •*»  pg(b)  "  •*.  pg(c)  =  •1*  p£(<i)  “  •1* 
Evidence  is  next  presented  that  it  is  somewhat  likely  that  the  thief  was  a 
woman.  If  the  female  defendants  are  a  and  c  ,  then  «  (a,c),  F2  =  {b,d}. 
If  "  *7  and  Jeffrey  updating  is  again  judged  acceptable,  then 

pe3(a)  ’  *56,  pes(b)  “  •2A>  pe3(c)  "  *14,  pe3(d)  “  *06  * 

If  instead  the  evidence  (F^,  .7),  (F2,  .3)  is  presented  first  and  (E^,  .8), 
(E, ,  .2)  is  presented  second,  is  P .  equal  to  P  ?  Example  3.1  shows 
that  in  general  the  order  matters  since  the  currently  held  opinion  governs; 
in  this  example  the  reader  may  check  that  order  does  not  matter.  We  now 
investigate  why. 

3.2  Commutat iv ity 

There  are  two  aspects  to  successive  updating: 

The  updating  information  at  each  stage: 

^3*!)  ^Ei,pi^i»l  *  ^Fj ,qj^j“l  ’ 

the  J  condition  at  each  stage: 

(3.2)  P*(A| Ej)  -  P(A|E1)  and  P^<AI F j )  -  pe(AlFj> 

or,  if  updating  is  being  considered  in  the  other  order, 

P*(A|Fj)  -  P(A|Fj)  and  P^A^)  -  P^A^)  . 
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The  J  condition  is  an  internal  or  psychological  condition  which  must  be 
checked  or  accepted  at  each  stage.  Mathematics  has  nothing  to  offer  here. 


Mathematics  can  be  used  to  check  if  (3.1)  is  compatible  with  commutati¬ 
vity.  Since  Jeffrey  updating  fixes  the  probabilities  on  the  partition  (i.e., 

P  (F.)  *  q.  and  P  (E.)  ■  p.),  commutativity  will  be  possible  only  if 
c>«j  j  j  3o  i  1 

(3.3)  P£3^Ei^  *  pi  and  PjgCFj)  “  *  for  a11  i  and  j  . 

It  turns  out  that  this  condition  is  sufficient: 

Theorem  3.1:  If  (3.3)  holds,  then  P„,.  “  P„„. 

C<7  cJC 

In  other  words,  whenever  P  and  P  both  incorporate  (3.1),  they 

3£  £•  cj 

actually  coincide.  Theorem  3.1  is  an  immediate  consequence  of  Csiszar  (1975, 

Theorem  3.2)  and  its  proof  is  omitted.  Csiszar's  theorem  implies  that  the 

common  measure  P  ■  P  is  the  "I-projection"  of  the  original  measure  P 
C  J  3£ 

onto  the  set  of  measures  which  incorporate  (3.1).  We  discuss  I-projection 
further  in  Section  6. 

3.3  Jeffrey  Independence 

A  second  approach  to  the  mathematical  aspects  of  commutativity  of  suc¬ 
cessive  Jeffrey  updating  uses  independence.  Two  partitions  £  ■  {E^}, 

3-  {Fj }  such  that  P(E^)  >  0,  P(F^)  >  0  for  all  i  and  j  ,  are  P-inde- 
pendent  if 

(3.4)  P<E±| Fj)  -  P(E1)  and  P(Fj|Ei)  -  P(Fj)  ,  all  i,  j  . 

Independence  says  that  conditioning  on  3  does  not  change  the  probabilities 
on  £  and  vice  versa.  Analogously,  we  define: 
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(3.5)  £  and  3  are  Jeffrey  Independent  with  respect  to  P.Cp^}  and 

{qj }  if  Pe<*\j)  “  POy  and  P^Ej)  “  p(Ei>  holds  for  a11  1  and  j. 

(Briefly,  "J- independent  wrt  {p^},  {qj}".)  Thus  Jeffrey  independence  says 
that  Jeffrey  updating  on  £  with  probability  p^  does  not  change  the  pro¬ 
bability  on  3  and  similarly  with  £  and  3  interchanged.  The  next  theo¬ 
rem  shows  the  connection  with  commutativity. 

Theorem  3.2:  Let  P,  (E^pJ  and  {F.,q.}  be  given.  Then  P_  -  P__ 
-  i  i  j  i  £3  3C 

if  and  only  if  £  and  3  are  Jeffrey  Independent  with  respect  to  P,  {p^}, 

{qj}  • 

Proof:  Note  that  P„./A)  ■  P„.„(A)  for  all  events  A  if  and  only  if 

C<J  cTC 

(3'6)  ^  Pg(Fj )  P(E£)  P(AEiFj>  *  ^  P;j(Ei)  P(Fj )  P(AEiFj>  * 

Choose  A  ■  E.  F.  to  get 
X0  J0 

P„(F,  )  P(Pj  )  “  P.*(E .  )  P(F.  )  for  all  pairs  i  ,  j_  . 
e  J  0  l0  x0  d  o  uu 

Keeping  1q  fixed  and  summing  over  Jq  yields 

(3.7a)  P(E  )  -  P  (E.  )  ; 

10  5  i0 

similarly,  fixing  Jq  and  summing  over  1q  yields 

(3.7b)  P  (F  )  -  P(F  )  . 

°  J0 

Thus,  £  and  3  are  Jeffrey  independent  with  respect  to  P,  {p^},  {q^}. 
Conversely,  if  (3.7)  holds,  then 
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P£(Fj)  P(Et)  =  P(F.j)  P(E^)  -  PgCE^  P(F.j)  . 

Using  this  equality  shows  that  (3.6)  holds  and  so  P  *  P  . 

W  ov 

Theorem  3.3:  Two  partitions  £  and  3  are  P-independent  if  and  only 
if  £  and  3  are  Jeffrey  independent  with  respect  to  any  update  probabili 
ties  {p^}  and  (q^). 

Proof :  First  suppose  £  and  3  are  P-independent.  Then 
(3.8)  P£(Fj)  -  l  P(Fj|E1)p1  -  l  P(Fj)pi  -  P(Fj )  . 

To  see  the  converse,  suppose  £  and  3  are  not  P-independent.  Then 

there  exist  E  and  F  such  that  P(F  E  )  ^  P(F  ).  Pick  p 

x0  J0  X0  30  0 
sufficiently  close  to  1.  Then 


l  ?(F  E  )p  +  P(F  )  , 
i  30  J0 


and  hence  (3.8)  entails  P  (F.  )  t  P(F.  ). 

e  •'0  •’0 


0 


Example  3.3.  (J- independence  4>  P-independence) .  Suppose  PtejSj) 

is  given  by  the  following  table 

1/2 
1/4 
1/4 

1/2  1/4  1/4 


3*2  3*3 


1/4 

1/8 

1/8 

1/8 

0 

1/8 

1/8 

1/8 

0 
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1 

Then  g  and  9  are  not  independent,  but  update  probabilities  p,  q  exist 
such  that  g  and  3  are  J-independent  with  respect  to  them  (see  below) . 

An  efficient  algorithm  for  checking  J- independence,  in  this  and  other 
examples,  is  the  following.  Let  r^  denote  V.  E.  Johnson's  coefficient  of 
dependence  between  E^  and  Fj  (cf.  Keynes  1921,  pp.  150-155),  i.e., 

P(E1F1) 

Tij  “  P(Et)  P^)  ’ 

and  let  R  -  (r^)  ;  since  E^jjP^  *  Pg(Fj)/P(Pj)  and  ^r^q^  -  PjjCE^/PCE^) , 
it  follows  that  g  and  3  are  J-independent  with  respect  to  {p^} ,  tq^}  if 
and  only  if 

(3.8)  Zirijpi  “  1*  a11  J  ;  Ejrijqj  “  1»  a11  i* 

In  Example  3.3 

h  i  i\ 

R  -  1  0  2 

V  2  °J 

and  hence,  if 

p  *  ^p,  »  0  <  p  <  1  ,  and  q  ■  ^q,  — *  0  <  q  <  1  , 

then  gR  ■  1,  Rqt  -  1  ;  thus  g,  3  are  J-independent  with  respect  to  g,  q. 


Remark.  It  is  not  hard  to  show  that  if  at  least  one  of  the  two  partitions 
g  and  3  has  only  two  elements,  then  J- Independence  for  some  p,  q  pair  is 
equivalent  to  P- independence,  and  hence  to  J-independence  for  all  p,  q. 
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Lest  the  reader  think  that  commutativity  always  occurs  when  (3.1)  can 

be  incorporated,  we  conclude  this  section  with  an  example  which  has  P.  (E.) 

63  X 

=  p.^  (and  of  course  P£3.(Fj)  =  q^  but  such  that  P^F^)  i  q^ 


1/2  1/2 


Suppose  “  p2  “  1/2  and  »  7/15,  *  8/15.  Then  a  simple  computa¬ 

tion  shows  that  P^CE)  =  1/2  -  P^CE)  ,  but  P^F)  4 
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4.  COMBINING  SEVERAL  BODIES  OF  EVIDENCE 


Suppose  we  undergo  a  complex  of  experiences  that  result  in  our  simul¬ 
taneously  adopting  new  degrees  of  belief  on  two  partitions  g  -  { i: ? }  and 
3  -  {Fj }  ,  say 

(4.1)  P*(E1)  -  p£  and  P*(Fj)  -  qj  . 

How  should  we  revise  our  subjective  probabilities  so  as  to  incorporate  these 
new  beliefs?  In  general,  the  theory  put  forth  by  de  Finetti  has  no  neat 
mathematical  answer  to  this  question  -  you  just  have  to  think  about  things 
and  quantify  your  opinion  as  best  you  can.  In  this  section  we  discuss  two 
reasonable  routes  through  this  quantification  procedure.  The  routes  are 
reasor A\le  i*>  the  same  sense  that  exchangeability  is  a  reasonable  thing  to 
consider  when  attempting  to  quantify  probabilities  on  repeated  events  -  the 
circumstances  which  make  them  subjectively  acceptable  occur  frequently.  We 
first  discuss  whether  measures  satisfying  (4.1)  exist  and  if  so,  how  to 
uniquely  select  one. 

* 

4.1  Coherence  of  P 

If  we  are  to  adopt  the  degrees  of  belief  P  in  (4.1),  they  must  at 

least  be  coherent,  i.e.,  P*  must  be  extendable  to  a  probability  measure. 

Theorem  4.1  provides  a  simple  necessary  and  sufficient  condition  for  the 

existence  of  such  extensions.  The  proof,  given  below  in  the  Appendix,  gives  an 

* 

efficient  algorithm  for  computing  P  when  both  partitions  are  finite. 

Theorem  4,1:  Let  ft  be  a  countable  set,  g*  {E^}  and  3  ■  {F^} 
two  partitions  of  ft  ,  and  P,  Q  two  probability  measures  on  g  and  3 
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respectively.  There  exists  a  probability  measure  F  on  ft  such  that 

(4.1)  holds  if  and  only  if  whenever  disjoint  sets  A  and  B  are  given, 
with  A  a  union  of  elements  of  £  ,  B  a  union  of  elements  of  3  , 

(4.2)  P(A)  +  Q(B)  <  1  . 

Remark.  Condition  (4.2)  is  necessary  but  not  sufficient  for  Theorem  4.1 
to  hold  if  ft  is  uncountable. 

* 

4.2  Extending  P 

If  (4.1)  is  coherent,  it  remains  to 

* 

(4.3)  choose  a  probability  P  on  the  partition  {E^  H  F^}  which  agrees 
with  (4.1); 

(4.4)  extend  P  to  all  of  ft. 

If  judged  valid,  the  easiest  way  of  accomplishing  (4.3)  is  to  use  in- 
A  4r 

dependence:  P  (E.^  fl  F^)  °  P  (E^)  P  (Fj)  -  P^j  • 

Richard  Jeffrey  (1957,  Chapter  4)  has  advocated  another  route  from  (4.1) 
to  a  final  probability  assignment:  successive  Jeffrey  updating  on  £  and 
3.  This  raises  two  issues: 

(4.5)  When  does  successive  updating  satisfy  (4.1)? 

(4.6)  When  is  successive  updating  reasonable? 

Question  (4.5)  arises  because  P  need  not  equal  P  .  Indeed, 

£3  36 

Example  3.4  provides  a  situation  where  (4.1)  is  coherent  because  P  sa- 

tisfies  (4.1),  but  P„_  4  P__.  Since  matters  are  simplified  when 

C*7  «JG  Cw  X 

we  note  that  the  results  of  Section  3  imply  that  the  following  three  con¬ 
ditions  are  equivalent 
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(4.7a)  P  (A)  **  P_  (A)  for  all  sets  A. 

C,t?  «TO 

(4.7b)  PeJJ(E1)  -  P3e(Ei)  and  P^Fj)  -  P^Fj)  for  all  1  and  j. 
(4.7c)  Pg(Ei)  -  P(Ei)  and  P^)  *  P(Fj)  for  all  i  and  j. 


Even  when  the  order  does  not  matter,  we  still  have  the  responsibility 
of  justifying  the  resort  to  successive  updating,  i.e.,  problem  (4.6).  One 


approach  to  this  is  via  checking  the  Jeffrey  condition  at  each  stage  of  up¬ 
dating.  This  is  a  somewhat  unorthodox  mental  exercise  given  we  currently 
believe  (4.1),  a  condition  involving  both  partitions.  If  we  update  first 
on  £  ,  then  we  must  check  P(A|E^)  *  P  (a|e^)  which  amounts  to  thinking 
as  if  we  don't  know  about  3  and  are  only  thinking  about  £.  At  the  second 
stage,  one  then  checks  =  Pg(A|F^)  »  comparing  one's  opinion  not 

knowing  3  to  one's  opinion  knowing  3.  Examples  such  as  Example  3.4  show 
that  this  can  be  tricky.  It  is  a  possible  route,  however,  one  more  general 
than  the  route  using  independence  suggested  before. 


Remark  1.  There  is  no  reason  to  have  P„„  ■  P„„  for  successive  updating 

C<J  Ov 

to  be  useful  and  valid.  If  each  of  the  (J)  conditions  is  judged  valid  in 

forming  P  and  if  P  satisfies  (4.1),  then  P  is  a  consistent  quan- 
£3  £3  £3 

tiflcation  of  current  belief. 


Remark  2.  Condition  (4.7)  implies  that  and  P.,„  cannot  both  incor- 

'  C*j  «jo 

porate  (4.1)  and  both  be  judged  acceptable  updates  (in  the  sense  that  the 

(J)  conditions  have  been  checked)  without  P  ■  P__ .  Thus  non-commutati- 

w  3c 

vity  is  not  a  real  problem  for  successive  Jeffrey  updating. 
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5.  MECHANICAL  UPDATING 

The  approach  we  have  taken  thus  far  to  justifying  Jeffrey's  rule  is 
subjective  -  through  checking  condition  ( J) .  Several  authors  -  Griffeath 
and  Snell  (1974),  May  and  Harper  (1976),  Williama  (1980),  and  van  Fraassen 
(1980)  -  have  pursued  a  different  justification.  Given  a  prior  P  ,  parti¬ 
tion  {e^}  ,  and  a  new  measure  P  on  {E^  ,  find  the  "closest"  measure 

* 

to  P  which  agrees  with  P  on  the  partition  and  take  this  as  defining 
* 

P  on  the  whole  space.  Since  this  way  of  proceeding  does  not  attempt  to 
quantify  one's  new  degrees  of  belief  via  introspection,  we  call  this  approach 
mechanical  updating. 

5.1  Minimum  Distance  Properties 

If  "close"  is  defined  in  any  of  several  common  ways,  the  closest  mea¬ 
sure  is  that  given  in  Jeffrey's  rule.  We  illustrate  this  with  three  well 
known  notions  of  closeness  between  measures  P  and  Q  on  the  countable 
set  Q  : 

(5.1)  The  variation  distance 

||P— Q II  -  sup{  |P(B)  -  Q(B)  |  :  B  c  n)  . 

Two  measures  are  close  in  variation  distance  if  they  are  uniformly  close  on 
all  subsets. 

(5.2)  The  Hellinger  distance 

H(P,Q)  -  l  (/PW  -  i'QUS))2  . 

to 

(5.3)  The  Kullback-Lelbler  number  of  Q  with  respect  to  P 
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1 

) 


I(Q,P)  -  l  Q(u>)  log  (Q(o))/P(id))  . 

Oi 

The  variation  and  Hellinger  distances  are  actual  metrics  on  the  space 
of  probability  distributions,  the  Kullback-Leibler  number  is  not,  being 
asymmetric  in  its  arguments.  Kailath  (1967)  and  Csiszar  (1977)  are  good 
surveys  with  references  of  the  properties  of  (5.1),  (5.2),  and  (5.3). 


Theorem  5.1:  Let  (2  be  a  countable  set,  P  a  probability  on  ft  , 

* 

and  {E^  be  a  partition  of  (2.  Suppose  P  (E^)  0  are  given  numbers 

•Jt 

such  that  IP  (E^)  =  1.  Let  Q  be  a  probability  on  ft  such  that  Q(E^)  *= 
«  P*(E,).  Then 


(5.4)  1| Q— P  ||  >  \  sup|p(Ei)  -  P*(Ei)|, 

(5.5)  H(Q,P)  >  I(/Re]T  -  /p*(Et))2  , 

(5.6)  I(Q,P)  >  EP*(Et)  log  (P*(Ei)/P(E1))  . 

In  (5.5),  and  (5.6)  equality  holds  if  and  only  if  Q(A)  =  iP^lEp  P  (E^) . 
Remarks .  1)  Although  the  probability  measure  given  by  Jeffrey's  rule  mi¬ 

nimizes  the  variation  distance,  it  does  not  do  so  uniquely;  see  May  (1976). 
2)  In  Theorem  5.1,  the  minimum  distance  between  P  and  Q  is  the  distance 
between  P  and  Q  viewed  as  measures  on  the  partition  {E^}.  3)  A  result 

like  Theorem  5.1  holds  for  several  other  notions  of  distance;  see  Section  6 
where  a  generalization  of  Theorem  5.1  is  given. 


5.2  I-Projectlons  and  the  IPFP 

Mechanical  updating  allows  the  possibility  of  updating  on  more  general 
collections  of  sets  than  partitions.  Suppose  we  want  to  adapt  new  degrees 
of  belief  P*(E1)  ■  Pi»  1  <  i  £  n  ,  where  £  -  {E^Ej, . . .  .E^}  is  not 
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necessarily  a  partition  of  ft.  This  situation  is  closely  related  to 
Jeffrey's  proposal  of  updating  simultaneously  on  several  partitions,  men¬ 
tioned  in  Section  4,  in  as  much  as  updating  simultaneously  on  partitions 

k 

£ 2 »  •••»  £^  is  t^iat  831,16  as  updating  on  £  ■  U  Conversely,  up¬ 
dating  on  £  =  {E^,...,E  }  can  be  viewed  as  updating  simultaneously  on  the 

partitions  £1  =  {E^E^},  £2  =  {E2,E2},  ...,  £n  =  {En>E^}.  In  general,  the 

set  C  =  {Q  :  Q(E^)  =  p^  for  all  i}  is  a  convex  set  of  probability  measures 

on  ft  which  can  be  empty,  contain  a  single  element,  or  contain  many  elements. 

*  * 

In  the  first  case  P  is  incoherent,  in  the  second,  P  is  uniquely  deter¬ 
mined.  When  the  third  case  holds,  we  can  use  the  Kullback-Leibler  number 
as  a  notion  of  "distance"  to  pick  a  unique  number  of  C  closest  to  P. 

Theorem  5.2:  Let  S(P,“)  =  Q  :  I(Q,P)  <  “.  If  S(P,°°)  fl  C  4  (f)  ,  then 
there  exists  a  unique  element  Qj  e  C  such  that  I(Qj,P)  *  inf{l(Q,P):  Q  e  C} . 

Proof :  This  is  an  immediate  consequence  of  Csiszar  (1975,  Theorem  2.1), 

C  being  convex  and  closed  with  respect  to  the  variation  distance.  □ 

In  Csiszar' s  terminology,  Qj  Is  the  I-projection  of  P  onto  C. 

(The  term  is  meant  to  suggest  the  projection  of  a  vector  in  Rn  onto  a  sub¬ 
space.)  The  I-projection  is  closely  related  to  a  widely  used  technique  in 

the  statistical  analysis  of  contingency  tables. 

A  standard  method  of  adjusting  an  r  x  c  contingency  table  so  that  it 
has  given  marginal  totals  is  the  iterated  proportional  fitting  procedure 
(IPFP) .  In  this,  one  first  adjusts  the  table  to  have  specified  row  sums, 
say  (by  dividing  the  numbers  of  a  given  row  by  the  appropriate  factor) ,  next 
adjusts  the  new  table  to  have  the  right  column  sums,  and  then  continues 
iteratively.  It  follows  from  Csiszar  (1975,  Theorem  3.2)  that  this  proce¬ 
dure  converges  to  the  I-projection  of  the  initial  table  onto  the  set  of 


23 


* 


tables  with  the  specified  row  and  column  sums.  That  is,  the  IPFP  finds 
the  "closest"  table  to  the  original  table  with  the  prescribed  margins 
(provided, of  course,  this  set  is  nonempty).  This  is  essentially  the  same 
as  finding  the  closest  measure  to  an  initial  probability  with  prescribed 
values  on  two  partitions. 

The  IPFP  can  be  used  to  compute  Qj.  of  Theorem  5.2  by  treating  the 
problem  as  an  n-dimensional  contingency  table  with  given  margins 
P*(E1),  1  -  P*(E1) . 

5.3  Comparing  Different  Metrics 

Theorem  5.1  suggests  that  Jeffrey's  rule  is  an  uncontroversial  form 
of  mechanical  updating  in  the  sense  that  it  agrees  with  virtually  every  mi¬ 
nimum  distance  rule.  As  noted  above,  in  the  case  of  two  or  more  partitions, 
the  I-projection  or  maximum  entropy  solution  can  be  viewed  as  a  limiting 
form  of  successive  Jeffrey  updating.  This  is  perhaps  of  some  interest  in¬ 
asmuch  as  mechanical  updating  via  the  other  minimum  distance  methods  need 
not,  in  general,  yield  the  same  answer  as  the  I-projection. 

Example  5.1.  (I-projection  ^  minimum  variation  distance.)  Consider 
passing  from  an  initial  table 


1/3  2/3 

1/3  p,  p? 

to  P  -  - = - —  , 

2/3  p3  p4 

a  new  table  with  the  specified  margins  and  which  is  otherwise  as  "close"  to 

the  original  table  as  possible,  according  to  some  notion  of  closeness. 

I  12  4 

a)  The  table  P  given  by  p^  -  p2  ■  p3  ■  -g,  p^  ■  j  minimizes 

I(P,P8)  since  P°  is  independent  and  I-projections  preserve  the  associa¬ 
tion  factor  of  a  2><2  table  (see,  e.g.,  Hosteller  (1968),  p.  3  ).  The 


pl 

P2 

p3 

p4 

1/4 

1/4 

1/4 

1/4 
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1.4  1 1  _7_ 

'9  41  2 ' 9  "  41  ”  36  * 


4 

variation  distance  for  this  table  is  ||PI-P°||  =  y  £  |p^  -  i|  -  y|y  _  ij  + 

0  ~  1=1 
1 1 


b)  To  find  the  table  P  with  minimum  variation  distance  from  P°  , 
subject  to  the  margin  constraints,  note  that  given  p^,  one  has  p^  «  p^  ■ 
•j  -  p^  and  P4  =  Pi  +  "J*  Hence 


P  -  Pc 


■  5  -  h  ‘  i{jj  -  »il  -  2lt j  -  pi  I  +  I-35-  pi|} 

which  is  minimized  by  p^  ■  —  ,  the  median  of  yy,  ^  •  Hence 


nV  ,1115.  ,  11-c.V 

?  =  ^12*  4’  4’  12^  an<^  ~  ~  ? 


1 

6  ‘ 


There  has  been  considerable  interest  recently  in  maximum  entropy  methods, 
especially  in  the  philosophical  literature  (Rosenkrantz  (1979),  Williams 
(1980),  van  Fraassen  (1980)).  Example  5.1  suggests  that  any  claims  to  the 
effect  that  maximum  entropy  revision  is  the  only  correct  route  to  probability 
revision  should  be  viewed  with  considerable  caution  because  of  its  strong 
dependence  on  the  measure  of  closeness  being  used. 
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6.  ABSTRACT  PROBABILITY  KINEMATICS 


In  this  section  we  briefly  discuss  the  generalization  of  Jeffrey's 
rule  of  conditioning  from  the  countable  setting  to  general  spaces.  The 
need  for  such  a  generalization  is  shown  by  passing  to  the  limit  in  the  ex¬ 
ample  of  Section  1.1. 

Example  6.1.  Consider  an  infinite  sequence  of  zero  or  one  outcomes 
X^,  X^f  ...  •  Suppose  that  the  joint  distribution  of  is  exchange¬ 
able  and  set  sn  “  X1  +  +  xn*  Then,  as  shown  by  de  Finetti,  the  limit 

S 

Z  -  lim  — 
n-*»  n 

exists  almost  surely  and 

P(Sn  -  k|z  -  p)  -  (")pk(l-p)n"k  . 

One  consequence  of  de  Finetti' s  theorem  is  that  one  may  decide  on  a  subjec¬ 
tive  probability  distribution  for  an  infinite  exchangeable  sequence  of  coin¬ 
tosses  by  introspecting  on  the  "prior  distribution"  P{z  e  dp}  =  dy(p).  In 
the  example  of  Section  1.1,  the  effect  of  the  informant's  information  could 
be  taken  into  account  by  choosing  a  new  prior  dy  (p)  and  Jeffrey's  rule 
becomes : 

P*(Sn  -  k)  »  |  ^pk(l-p)n~k  dy*(p)  . 

This  illustrates  the  use  of  Jeffrey  updating  via  a  continuous  "sufficient 
statistic"  rather  than  a  countable  "sufficient  partition".  The  generaliza¬ 
tion  we  use  replaces  partitions  by  o-algebras. 


Consider  a  probability  space  (ft,  G,  P)  ,  thought  of  as  describing 
our  current  subjective  beliefs  about  the  a-algebra  of  events  G.  Let  P 
be  a  new  probability  measure  on  G  and  Gq  c  G  a  sub-o-algebra  of  G. 


Let  C 

be  an  Gp-measurable  set  such  that 

P(C)  *  0  and 

P  « 

P  on 

ft  -  C  , 

_ 

where  P,  P  are  the  restrictions 

* 

of  P,  P  to 

V 

The  appro- 

priate  version  of  Jeffrey's  condition  (J)  is: 

(J'>  Gq  is  sufficient  for  {P,P  }  . 

When  condition  (J*)  holds,  Jeffrey's  rule  of  conditioning  becomes: 


(6.1) 


P*(A) 


ft-C 


P(a|g0)  p  (dw)  +  P  (A  n  c) 


where  P(a|Gq)  is  the  conditional  probability  of  A  given  Gq.  If 

* 

P  «  P  ,  we  can  take  C  =  <t> . 

Much  of  the  mathematical  machinery  for  dealing  with  Jeffrey  condition- 
alization  in  this  generality  has  been  developed  (for  a  different  purpose) 
by  Csiszar  (1967).  His  Lemma  2.2  translates  into  a  likelihood  ratio  version 
of  Jeffrey's  rule  (compare  (2.2)):  Let  X  be  a  a-finite  measure  which 

*  _  —  -*  r 

dominates  P,  P  .  Let  A,  P  P  be  the  restrictions  to  Gq.  Assume  A 

—  — *  ic 

is  a-finite.  Let  p(x)  p  (x)  be  the  densities  of  P,  P  with  respect  to 
X  and  p  the  density  of  p  with  respect  to  X.  If  condition  (j')  holds, 

then: 


(6.2) 


P*(x) 


fp*(x)/p(x)  if  p (x)  >  0 


p*(x)  if  p (x)  ■  0  . 


Identity  (6.2)  is  a  version  of  the  Fisher-Neyman  factorization  theorem  (see 
Halmos  and  Savage  (1949)). 


Csiszar's  results  allow  us  to  give  a  single  theorem  which  Includes 
Theorem  5.1,  showing  that  the  closest  measure  to  P  which  agrees  with  P* 
on  .  Gq  is  the  measure  given  by  (6.1).  Csiszar  has  introduced  the  notion 
of  f-divergence,  where  f  is  a  convex  function  defined  in  the  interval 
(0,»).  If  ^  and  m2  are  two  measures  on  (ft,  G)  ,  the  f-divergence  of 
and  y2  is 

VW  *  \  p2(x)  f(?7^)  X(dx)  * 

dui 

where  y^  «  X  and  (i  ■  1,2).  Taking  f(u)  ■  u  log  u  gives  the 

1/2  2 

Kullback-Leibler  number,  f(u)  *  (u  -  1)  the  Hellinger  distance, 
f(u)  ■  | u  —  1 |  the  variation  distance.  Csiszar  shows  that  several  other 
notions  of  distance  are  also  f-divergences  for  an  appropriate  f . 

Theorem  6.1:  Let  C  be  the  set  of  probability  measures  on  (ft,  G) 

* 

which  agree  with  P  on  Gq  ,  and  f  a  convex  function  on  (0,°°).  Then 
under  condition  (J'), 

(6.3)  If(P*,P)  =  If(P*,P)  -  inf {lf (Q,P)  :  Q  e  C}  . 

* 

If  f  is  strictly  convex,  then  P  is  the  unique  probability  measure  on 
G  which  minimizes  the  right-hand  side  of  (6.3). 

Proof :  The  first  equality  follows  from  the  sufficiency  of  Gq  for 
{P,P*}  ,  the  second  from  Csiszar's  (1967,  Section  3)  version  of  the  minimum 
information  discrimination  theorem  of  Kullback  and  Leibler:  I^(Q,P)  >_  I^(Q,P). 
Since  I^(Q,P)  ■  I^(P*,P)  ,  (6.3)  follows.  If  f  is  strictly  convex, 
I^(*,P)  is  also,  and  the  theorem  follows.  □ 
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APPENDIX 


We  first  prove  a  slight  generalization  of  Theorem  4.1: 

Theorem  4.2:  Let  8  be  a  countable  set,  S  a  a-algebra  of  subsets, 

C  and  6  sub-o-algebras  of  S  ,  and  y  and  V  probability  measures  on 
G  and  B  respectively.  A  necessary  and  sufficient  condition  for  there  to 
exist  a  probability  measure  P  on  (G,S)  such  that  P  equals  y  on  G 
and  P  equals  v  on  8  is 

(4.8)  for  each  A  e  G  and  B  e  8  such  that  A  fi  B  ■  4>  , 

y(A)  +  v(B)  £  1  . 

Proof :  The  condition  is  clearly  necessary.  To  prove  sufficiency,  let 

00  .  .00 

be  the  atoms  of  G  and  be  t'ie  atoms  of  8.  Let 

&a  =  and  G^  *  thought  of  as  discrete  topological 

spaces.  In  G  *  G.  ,  consider  the  set  F  •  U  A.  x  B..  This  is  a 

3  15  A^^  1  J 

closed  set  in  G&  x  G^  and  according  to  Theorem  11  of  Strassen  (1964)  a 
necessary  and  sufficient  condition  for  there  to  exist  a  probability  measure 
Y  on  F  such  that  y  has  margins  y  and  v  is  that  for  every  B  e  8  , 

(4.9)  v(B)  <  y(ira(F  0  S  «  B)) 

where  is  the  projection  of  a  set  into  its  first  coordinate.  Clearly 

it  (F  fl  S  x  B)  ■  £  A  is  the  smallest  G  measurable  set  containing  B. 

a  AtnB-4> 

Thus  Strassen's  condition  (4.9)  is  satisfied  if  and  only  if 

(4.10)  whenever  A  £  G,  B  e  8  and  B  c  A,  v(B)  <  y(A)  . 

Condition  (4.10)  is  equivalent  to  (4.8).  Hence  Strassen's  theorem  gives  a 
measure  y  which  may  be  regarded  as  a  measure  on  the  partition  {A^  fl  Bj). 
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Since  ft  Is  countable,  y  can  clearly  be  extended  to  a  measure  on  all  of 
Q  and  then  restricted  to  a  measure  P  on  S  with  the  desired  properties.  □ 

In  the  proof  of  Theorem  4.2,  we  have  used  Strassen's  theorem,  which 
Itself  uses  the  Hahn-Banach  theorem.  When  the  two  partitions  are  both  com¬ 
posed  of  a  finite  number  of  sets,  Hansel  and  Troallic  (1978)  have  shown 
that  Strassen’s  theorem  follows  from  the  max  flow-min  cut  theorem.  There 
are  efficient  algorithms  for  finding  maximum  flows,  and  hence  for  checking 
(4.2),  in  Bondy  and  Murty  (1976),  Chapter  11. 

Finally,  we  consider  the  extension  of  Theorem  4.2  to  more  general 
spaces. 

Let  (ft,S)  be  a  measurable  space,  let  G,  B  be  sub-o-algebras  of  S 

and  let  y  and  V  be  probability  measures  on  G  and  B  respectively. 

When  does  there  exist  a  probability  measure  P  on  o(G,  B)  such  that  P 

restricts  to  y  on  G  and  V  on  B?  We  have  argued  above  that  if  ft  is 

countable,  then  a  necessary  and  sufficient  condition  is 

(4.11)  VA  e  G,  B  £  B,  A  0  B  -  (j)  «»  y(A)  +  v(B)  <  1  . 

It  is  easy  to  show  that  (4.11)  is  in  fact  necessary  and  sufficient  for 

A 

the  existence  of  a  finitely  additive  measure  P  on  the  algebra  generated 
by  G  and  B  (and  hence  on  the  algebra  of  all  subsets  of  ft)  which  re¬ 
stricts  to  y  on  . G  and  v  on  B  ,  even  if  G  and  B  are  merely  alge¬ 
bras.  Briefly,  one  considers  the  following  linear  subspace  of  bounded  real 
valued  functions  from  ft:  L  ■  {f  +  g:  f  is  G  measurable  and  g  is  B 
measurable),  and  extends  the  positive  linear  functional  £.(f  +  g)  “  y(f)+v(g) 
using  the  Hahn-Banach  theorem.  Condition  4.11  is  then  used  to  show  l 
is  well  defined. 
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We  now  present  an  example  due  to  David  Freedman  and  Jim  Pitman  which 


shows  that  condition  (4.11)  is  not  sufficient  to  ensure  that  a  countably 
additive  extension  of  M  and  V  exists.  The  example  shows  a  bit  more: 
it  shows  that  Theorem  11  of  Strassen  (1965)  cannot  be  extended  to  give  con¬ 
ditions  for  a  measure  on  an  F^  set  of  the  unit  square  to  have  given  mar¬ 
gins. 

Example  4.2.  (D.  Freedman,  J.  Pitman):  There  exists  an  F  set  K 
in  the  unit  square  and  a  finitely  additive  probability  ir  on  K  which  has 
marginal  projections  equal  to  Lebesgue  measure  on  each  coordinate  but  such 
that  K  supports  no  countably  additive  probability  with  these  margins. 

Remark.  Taking  ft  =  K 

•  G  *  {(A  x  [0,1])  (1  K:  A  is  a  Borel  set  of  [0,1]}  , 

B  =*  (([0,1]  xBflK:  B  is  a  Borel  set  of  [0,1]}  , 
with  ii  and  v  as  Lebesgue  measure  on  G  and  B  respectively,  gives  a 
situation  in  which  4.11  is  satisfied  (since  a  finitely  additive  refinement 
exists)  but  no  countably  additive  refinement  exists. 

Proof :  To  construct  K  ,  let  a^  be  a  sequence  of  numbers  in  (0,1) 
with  an  +  1.  Let  £ n  be  the  line  in  the  unit  square  connecting  (0,0)  to 
(l,an).  Let  K  »  Un  £n-  Note  that  K  does  not  include  the  diagonal.  To 
construct  u  ,  let  irn  be  Lebesgue  measure  on  the  Borel  sets  of  £n>  Let 
p  be  any  finitely  additive  probability  measure  defined  on  all  subsets  of 
the  integers  {1,2,3,...}  such  that  p  is  zero  on  finite  subsets.  Let 
ir(s)  ■  /  irn(s)  p(dn).  Each  irn  ,  considered  as  a  probability  on  K  ,  pro¬ 
jects  to  Lebesgue  measure  on  the  x-axis  of  the  unit  square.  Further,  the 
projection  of  irn  onto  the  y-axls  of  the  unit  square  gives  Lebesgue 


31 


measure  restricted  to  the  set  {(0,y):  0  <  y  <  an>.  It  follows  easily 

that  the  y-axis  margin  of  is  Lebesgue  measure.  It  only  remains  to  argue 
that  K  does  not  support  a  countably  additive  probability  measure  P  which 
projects  to  Lebesgue  measure.  If  X:  K  -*■  [0,1]  ,  and  Y:  K  -*■  [0,1]  are 
the  two  projections,  then  E(X)  -  E(Y)  -  j  because  each  of  X  and  Y  are 
uniformly  distributed  by  construction.  Any  countably  additive  P  would  have 
to  put  positive  probability  on  some  line  i and  since  for  all  (x,y)  c  K, 
y  <  x  ,  this  forces  E(X)  >  E(Y).  The  contradiction  shows  P  cannot  exist.  □ 
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